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ABSTRACT 

In this paper, the performance of the emerging MPEG-4 
SVC CODEC is evaluated. In the first part, a brief intro- 
duction on the subject of quahty assessment and the devel- 
opment of the MPEG-4 SVC CODEC is given. After that, 
the used test methodologies are described in detail, followed 
by an explanation of the actual test scenarios. The main 
part of this work concentrates on the performance analysis 
of the MPEG-4 SVC CODEC - both objective and subjec- 
tive. 
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1. INTRODUCTION 

As both high visual quality and low bandwidth require- 
ments are key features in the emerging mobile multime- 
dia sector, MPEG and VCEG introduced a new exten- 
sion to the MPEG-4 AVC standard - scalable video cod- 
ing (SVC|^ Its focus lies on supplying different client de- 
vices with video streams suited for their needs and capabil- 
ities. This is achieved by employing three different scalabil- 
ity modes: Spatial, temporal and SNR scalability. Because 
these new features are still in development and their impact 
on visual quality has not often been independently tested, 
this paper covers this subject. 

The performance evaluation is done using both objective and 
subjective assessment methods. Each method has different 
advantages: 

While subjective testing reflects the viewers impressions 
best, it has several downsides. It is much more time consum- 
ing and therefore also expensive. Also, very small differences 
in video quality cannot be reliably detected. 
In contrast, objective analysis can effectively be run auto- 
mated on computer systems. Therefore, it is much cheaper 

^The SVC reference software has gone into Final Draft In- 
ternational Standard in the MPEG October meeting 2008. 



and its results are easily comparable. However, the objec- 
tive metrics used to calculate the quality scores do never 
perfectly reflect user experience. 

By using both subjective and objective test methodologies, 
the advantages of each assessment method can be used to its 
full potential. The comparison of the results also expresses 
the deviation of the objective scores from the viewers' sub- 
jective opinion. 

Additionally to the evaluations covering the matter of visual 
quality, additional test runs are performed to check the en- 
coding speed of the SVC CODEC, which is also an impor- 
tant feature, especially when looking at realtime encoding 
scenarios. 

The assessment is divided into two separate parts: The first 
one is a MPEG-4 SVC stand-alone test, which throughly 
examines the impact of different encoding settings on the 
codec's performance. The second part of the testing 
consists of a competitive comparison of the MPEG-4 SVC 
reference CODEC, x264 (MPEG-4 AVC based) and Xvid 
(MPEG-4 ASP based), to analyze each CODEC'S advan- 
tages and disadvantages in different usage scenarios. All 
tests are described in detail in section [S] 

2. RELATED WORK 

Most of today's quality evaluations are run objectively, be- 
cause of the previously mentioned high complexity and costs 
of subjective assessments. Still, some comparisons of sub- 
jective and objective assessment methods have been con- 
ducted, especially the CS MSU Graphics & Media Lab Video 
Group ran several evaluations concerning CODEC competi- 
tions featuring various MPEG-4 ASP & AVC implementa- 
tions [20] [21]. 

The emerging MPEG-4 SVC standard, however, has not 
been tested in such a manner. Although both objective [26] 
and subjective tests 4 have already been run separately, 
an analysis offering both test methodologies was yet out- 
standing. The results of the subjective evaluation of the 
SVC reference CODEC W are limited to the quality change 
if temporal levels are reduced in exchange for higher video 
quality. 

Besides that, the MPEG-4 SVC CODEC was also evalu- 
ated in an official ISO test 9 , which did however not assess 
a broad range of quality-impacting parameters, but only 
tested a few basic features. Another problem concerning 
this evaluation is that it only focused on the comparison 
of MPEG-4 SVC and its direct predecessor MPEG-4 AVC. 
No CODEC implementing the still commonly used MPEG- 
4 ASP standard was rated in the comparison, nor were the 



performance impacts of the encoding parameters of SVC 
evaluated. 

In this paper, a broader range of quahty- affecting settings 
and scenarios is assessed, including both a SVC stand- 
alone test as well as a competitive comparison of different 
CODECs, to provide a large-scale overview of the current 
SVC codec's performance. In the SVC stand-alone test, 
special attention is paid to the influences the new scalable 
features of the SVC CODEC have on the visual quality. 
Additionally, a comparative synthesis that comprises both 
subjective and objective test methods is conducted in this 
work. 

3. USED TEST METHODOLOGIES 

To provide comparable results, it is important for both ob- 
jective and subjective assessments to be run under strictly 
specified conditions. This means for objective tests that the 
used metric, which calculates the difference between an im- 
paired and an original image, and the encoding parameters 
are kept throughout the whole assessment. 
Additionally to the facts stated for objective testing, subjec- 
tive evaluations also need to have a fixed testing setup and 
environment, as various influences, like noise or sunlight can 
bias a users' opinion. 

The test methodologies used in the evaluation are throughly 
described in the following. 

3.1 Objective metrics 

3.L1 PSNR 

The PSNR is the currently most widely used metric for 
quality evaluations of compression techniques. The result is 
given in the logarithmic unit decibel (dB). Even though this 
metric can be calculated for luminance as well as chromi- 
nance channels, it is common to just use Y-PSNR, meaning 
only the difference in luminance is evaluated. PSNR is cal 
culated using the following equation: PSNR = 20 

m — 1 n — 1 

where MSE = ^ \\X{i,3) - Yii,j) 

culating the PSNR for a sequence of pictures, the MSE is 
calculated for the entire sequence and then inserted in the 
formula above, instead of calculating the PSNR for each 
frame and then calculating the mean [g]. The correlation of 
PSNR to subjective quality impression is discussed contro- 
versially: The results of the video quality experts group [18J 
come to the conclusion that PSNR correlation is on par with 
that of other metrics. In contrast, newer tests like 21 claim 
that the correlation of PSNR is significantly lower than that 
of the SSIM metric Stih, PSNR is the standard metric 
used in most quality assessments and literature. To ensure 
comparability, this metric will be used in the following tests 
too. 

3.1.2 PSNR adaption for temporally scaled videos 
As shown in 8 , normal PSNR calculation is not suitable 
for quality assessment of videos with temporal scalability. 
The calculated values are too low to accurately reflect per- 
ceived quality, so the following adapted quality score based 
on PSNR was proposed: QM = PSNR + m^'^^ (30 - FR). 
QM is the metrics score, FR is the framerate of the pro- 
cessed video. To calculate PSNR in this equation, the 
frames of the temporally scaled video are repeated to match 
the frame count of the original sequence. The resulting 
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sequence is then compared to the original using standard 
PSNR calculation. The parameter m is the normalized av- 
erage magnitude of large motion vectors, which is used to 
measure motion speed. The large motion vectors are the 
top 25% of the largest motion vectors in the video sequence. 
After calculating the average magnitude of the large motion 
vectors, this value is normalized by the image width 8 . The 
equation was specifically designed for videos with a maxi- 
mum framerate of 30 Hz. As the source videos used in the 
following work have different framerates, the following has 
to be considered: A simple adaption of the equation to fit 
the new source framerate {QM = PSNR+m^'^^ (60 - FR)) 
does not lead to reasonable results, so the impact of tem- 
poral decimation is only considered if the framerate drops 
below 30 Hz. This means that sequences with a framerate 
of 30 Hz or lower are always compared against those with 
30 Hz, so the metric described in ^ can be used without 
modification. 

3.2 SAMVIQ 

The Subjective Assessment Methodology for Video Qual- 
ity (SAMVIQ) is an invention of the EBU (European 
Broadcasting Union), which started in 2001 and finished 
in 2004. It is incorporated in ITU-R BT.700 by now [T5] . 
SAMVIQ was developed because most other subjective 
test methodologies (for example DSIS, DSCQS, SSCQE 
and SDSCE) are specialized in rating videos shown on TV 
screens, and not on home computer or even mobile devices. 
At the beginning of the test process, the subject watches 
the reference sequence. After that the expert has to 
watch and rate all impaired sequences, which are randomly 
ordered and made anonymous to the expert by labeling 
them alphabetically. If required, every sequence may be 
repeated as often as the tester likes. It is also possible to 
change the rating of a sequence anytime. The reference is 
also hidden among the impaired sequences and is therefore 
rated as well. 

For voting, a linear, continuous scale with a range of to 
100 points is used, where a higher value represents better 
image quality and a lower one worse quality respectively 
[151 fill. 



Reference can be repeated, if necessary 




Expert watches the 
reference sequence 



Expert selects an impaired Rating of the sequence on 

sequence and watches it a continuous quality scale 



Figure 1: Schematic of the SAMVIQ test method- 
ology. 

4. TEST SETUP 

4.1 Selection of experts 

The people that participate in the subjective assessment 
did not undergo a special selection. So a total of 21 persons 



of all age and working classes are included in the test. None 
of the experts was previously trained as a subjective tester 
or had a job associated with some kind of visual quality 
testing. 

However, before a person is approved as an expert in the 
evaluation, two aptitude tests are run: A visual acuity and 
a color blindness test. 

Visual acuity is obviously of great importance in subjective 
assessments of video quality, because even small differences 
in visual quality have to be detected by the expert. For 
that reason, the visual acuity of every viewer is tested 
using the Freiburg Visual Acuity, Contrast & Vernier Test 
(FrACT). This free program allows a computer-aided check 
of i.a. the visual acuity while complying with the EN ISO 
8596 standard. The process is thoroughly described in 3 . 
An acuity minimum of 1.0 is necessary, to take part in the 
following quality evaluation. Vision aids, like glasses or 
contacts, are permitted in the test. 

The color perception is also an important factor when 
assessing graphical material. Persons with a visual impair- 
ment of the color perception (like red-green (protanopia, 
deuteranopia) , blue-yellow (tritanopia) or total color 
blindness (achromatopsia)) cannot reliably detect color 
aberrations, which are a common error in video compres- 
sion, and are therefore excluded from the test This 
test is executed using the standard Ishihara test charts. 
After these tests, one person had to be excluded, leaving a 
total of 20 test subjects for the subjective assessment. 



4.2 Subjective test environment 

The testing environment is set up as follows: 
To prevent any unwanted display-related influences, the 
same device (a Samsung R40-T5500 Cinoso noteboo k, fur- 
ther technical details are shown in table 13 of section 6.2.2) 



is used for every test session and expert. The LCD sup- 
ports a resolution up to 1280x800 pixels and a luminance 
up to 200 cd/w? . The black level and contrast of the dis- 
play are adjusted using a PLUGE (Picture Line- Up Genera- 
tion Equipment) pattern. PLUGE patterns vary in format, 
but a typical pattern consists of at least three vertical bars 
(called PLUGE pulses) with different shades of black and 
dark gray. The adjustment process and the generation of a 
PLUGE pattern itself is not described here in more detail, 
further information concerning this is provided in |25j. 
During the playback of the sequences the test room's back- 
ground lighting is provided by a faint, artificial light source. 
The display is protected from direct light irradiation, to 
eliminate reflections. Daylight and other outside influences 
are also avoided as much as possible. The viewing distance 
is set concerning the rules of Preferred Viewing Distance 
(PVD) for an 15.4" LCD device. 

The display is aligned both horizontally and vertically to 
provide a viewing angle of < 20° to the expert, which is well 
inside the recommended parameters (viewing angle < 30°) 
stated in 12 . 

4.3 Encoder settings 

Three CODECs are assessed in the comparison: Xvid 1.1.3 
(MPEG-4 ASP), x264 core 59 r808bm ff5059a (MPEG-4 
AVC) and the new MPEG-4 SVC reference encoder 9.12.2. 
All encoder parameters are kept at default settings except 
for the settings listed in tables ^ and [2] 



Encoding Type 


Single pass - bitrate- 




based (ABR) 


Max consecutive 


2 


Threads 


4 


Table 1: x264 encoder settings. 


GOPSize 


4 


SearchMode 


4 


BaseLayerMode 


2 



Table 2: SVC encoder settings. 



The 'GOPSize' parameter is changed to a value of 4 to enable 
the usage of B frames. Encoding a video sequence without 
B frames would result in a significant drop in compression 
efficiency. 

The fast search algorithm is used, so 'SearchMode' is ad- 
justed to 4. 

The parameter 'BaseLayerMode' is altered as the default 
setting is invalid. 

5. CONDUCTED EVALUATIONS 

The assessment is split in two separate evaluations: Firstly, 
the MPEG-4 SVC CODEC is tested in a stand-alone test, 
to document the impact of different encoder settings on the 
resulting quality and assess the CODECs features. 
Secondly, the characteristics of the MPEG-4 SVC CODEC 
are compared to those of x264 and Xvid in a comparison 
test. 

5.1 MPEG-4 SVC stand-alone test 

5.1.1 Encoding parameter test 

Comparison of different block matching metrics. First, 
the different metrics available for block matching will be 
compared. There are four different options available for 
FullPel and SubPel: 

• SAD 

• SAD-YUV (Not available for SubPel estimation) 

• SSE 

• HADAMARD 



Metrics can be chosen independently for FullPel and SubPel 
calculations. In this test, the different metrics are compared 
in terms of impact on encoding speed and visual quality. 



Effect of different block matching algorithms and 
search parameters. There are two different options for 
block matching algorithms available: Block search and fast 
search. Block search is an algorithm usually referred to 
as full search or exhaustive search. Without restraints in 
search range, it offers perfect prediction, but at the cost 
of extremely high computational complexity, as all possible 
blocks have to be compared. The second option named fast 



search is - as the name indicates - a much faster alternative. 
The developers claim that the loss of precision is by far out- 
weighed by the speed increase obtained by this algorithm 
[19| . This test will show how the combinations of search 
range parameters, defined by the variables 'SearchRange', 
'BiPredlter' and 'IterSearchRange' in the encoder configu- 
ration, and block matching algorithms perform in terms of 
encoding speed and visual quality. 





'Foreman' 


BLB 


262 


ELB (CGS) 


1077 


ELB (MGS) 


1289 


EBs 


500, 1000, 1500 



Table 3: Encoding and extraction bitrates used in 
the CGS / MGS test. 



5.1.2 Quantization parameter test 

During this test, the impact of the quantization parameter 
(QP) on the video quality is evaluated. The value of the QP 
changes the strength of the quantization: The higher the 
QP, the stronger is the quantization of the sequence and the 
lower is the resulting video quality. As the value of the QP 
parameter any integer between and 51 can be selected. The 
QP can either be a constant integer or - using rate control 
- automatically dynamically adjusted to match a previously 
selected bitrate. 

For the evaluation, the 'Foreman' (GIF, 30 Hz), 'Crew' 
(4CIF, 60 Hz) and 'Pedestrian Area' (720p, 25 Hz) sequence 
are each encoded with a single layer and constant QPs of 
0, 10, 20, 30, 40 and 50. These sequences are used as they 
provide a wide range of different motion and spatial details. 
All other encoder settings are left at standard values. So, 
for each sequence six videos are made and evaluated in the 
test. 

5.1.3 Optimal quantization parameter test 

By using the results of the quantization parameter test and 
the filesizes of the encoded sequences, a QP range in which 
the optimal ratio of filesize and visual quality is achieved is 
pinpointed. To get exact results, each of the evaluated se- 
quences ('Foreman', 'Crew', 'Pedestrian Area') is addition- 
ally encoded with 9 different QP settings ranging from 31 
to 39 in single steps. These impaired sequences are then 
assessed. 

With the resulting quality scores and related filesizes for 
each QP setting, the exact location of the optimal quantiza- 
tion parameter is calculated. 

5.1.4 CGS /MGS test 

In the coarse grain scalability (CGS) / medium grain scala- 
bility (MGS) test, the impact of MGS on the video quality 
is assessed in comparison with CGS coding. To do so, the 
three sequences already used previously in the quantization 
parameter test ('Foreman', 'Crew', 'Pedestrian Area') are 
encoded with two layers (Base layer (BL) and enhancement 
layer (EL)). In CGS mode, only these two layers - using SNR 
scalability - could be extracted, while the sequence encoded 
with MGS additionally offered 4x4 MGS vectors to dynam- 
ically adjust to changing bandwidth needs. Except for the 
two layers, the standard encoding settings are employed. 
During the test, three different bitrates are compared. For 
each bitrate setting, a video stream is extracted out of the 
SVC file. The three sequences of each test video are then 
evaluated by the test subjects to determine if there is an im- 
pact of MGS on perceived quality in this setting and how big 
it is. The encoding bitrates for each layer (base layer bitrate 
(BLB), enhancement layer bitrate (ELB) and extraction bi- 
trates (EBs)) are: 





'Crew' 


BLB 


2409 


ELB (CGS) 


10707 


ELB (MGS) 


10101 


EBs 


3000,7000,11000 



Table 4: Encoding and extraction bitrates used in 
the CGS / MGS test. 





'Pedestrian Area' 


BLB 


1353 


ELB (CGS) 


4719 


ELB (MGS) 


5644 


EBs 


3000,4500,6000 



Table 5: Encoding and extraction bitrates used in 
the CGS / MGS test. 



As MGS encoding introduces an additional overhead to the 
SVC stream due to the availability of multiple MGS vectors 
the ELB bitrates of CGS and MGS sequences differ. 

5.7.5 Best extraction path test 

As the different video streams embedded in a SVC bitstream 
are arranged in a spatio-temporal cube, the best extrac- 
tion path test is conducted to determine which of the video 
streams is perceived as the optimal one for a given bitrate 
in terms of visual quality. 

To achieve this, the unimpaired original 4CIF sequences are 
encoded in three spatial (QCIF, GIF, 4CIF) and four tem- 
poral (7.5 Hz, 15 Hz, 30 Hz, 60 Hz) resolutions each. The 
QP of each layer is adjusted to match the target filesize of 
1000 KB. The resulting 12 impaired sequences are compared 
in the evaluation. 

The outcome of the best extraction path test shows which of 
the three kinds of impairments (spatial, temporal or SNR) 
has the biggest impact on perceived quality and, as a re- 
sult, if there is an extraction path which is can generally be 
recommended or if the results are highly dependent on the 
content of the encoded sequence. 

5.1.6 Packet loss test: 

The SVC codec's scalable features are most advantageous 
in streaming environments, especially the Internet, where 
the available bandwidth of each client differs significantly. 
Besides the bandwidth, the response time is also an im- 
portant aspect of the connection. To provide low delays. 



multimedia servers nearly exclusively rely on connections 
over RTP, which is based on UDP [t] [Tt]. While provid- 
ing small delays and timestamps (among other features), 
this protocol has the severe disadvantage that no error cor- 
rection is supported. The result is that transmissions over 
error-prone channels manifest in visual impairments of the 
streamed video file. 

To test the behavior of the SVC CODEC in the case of 
errors, an error recovery test was conducted using the 'Fore- 
man' sequence. The file was then encoded using: 



• The standard encoding settings, containing the follow- 
ing bitstreams after encoding: 



Layer-ID 
Layer-ID 1 
Layer-ID 2 



352x288, 7.5 Hz, 180.9 kbps 
352x288, 15 Hz, 216.7 kbps 
352x288, 30 Hz, 257.7 kbps 



• Two layers with spatial scalability (QCIF & CIF res- 
olution), containing the following bitstreams after en- 
coding: 



Layer-ID 
Layer-ID 1 
Layer-ID 2 
Layer-ID 3 
Layer-ID 4 
Layer-ID 5 



176x144, 7.5 Hz, 60.2 kbps 
176x144, 15 Hz, 77.2 kbps 
176x144, 30 Hz, 94.5 kbps 
352x288, 7.5 Hz, 240.8 kbps 
352x288, 15 Hz, 294.2 kbps 
352x288, 30 Hz, 353.0 kbps 



• Two layers with SNR scalability (QP 36 & QP 26), 
containing the following bitstreams after encoding: 



Layer-ID 
Layer-ID 1 
Layer-ID 2 
Layer-ID 3 
Layer-ID 4 
Layer-ID 5 



352x288, 7.5 Hz, 104.1 kbps 
352x288, 15 Hz, 136.8 kbps 
352x288, 30 Hz, 175.4 kbps 
352x288, 7.5 Hz, 696.4 kbps 
352x288, 15 Hz, 845.8 kbps 
352x288, 30 Hz, 1010.1 kbps 



In the subjective assessment, the experts are then asked to 
evaluate the sequences: In each test, the subject is first 
shown the uncompressed reference sequence. After that, 
the three impaired versions of the same sequence compressed 
with the three evaluated CODECs are compared to the orig- 
inal. 

During the objective evaluation, the three impaired se- 
quences encoded with the tested CODECs of each sequence 
are compared to each other. 

The results of this test show which of the three CODECs 
produces the best quality in mean and if or how great the 
bitrate and resolution impact the quality of each CODEC. 



5.2.2 Encoding speed test 

In the encoding speed test, the time of each CODEC to 
encode a given sequence is measured. For this evaluation 
the standard encoder settings are employed. For the 
encoding process, three sequences ('Foreman', 'Crew' and 
'Pedestrian Area') with different resolutions and a duration 
of 10 seconds each are used. 

The sequences are looped 3 times before the encoding 
process with Xvid or x264 to reduce measuring accuracies. 
This is necessary as the encoding times with these CODECs 
are very short for the non-looped sequences. SVC encoding 
in contrast is unproblematic in this respect due to its lower 
encoding speed. Additionally to the testing of all CODECs 
using their standard settings, the speed of the x264 CODEC 
is also evaluated when the parameter 'Threads' is reduced 
to '1', to investigated the impact multithreading has on its 
encoding speed. 



6. RESULTS 



As temporal scalability is already present in the SVC file en- 
coded with standard settings, this feature was not evaluated 
separately. 

During the evaluation, packet loss was simulated using the 
packet loss simulation tool (PacketLossSimulatorStatic.exe), 
which is included in the current SVC build. Further details 
concerning this tool are provided in |14j. The tested se- 
quences were exposed to four levels of error: 3%, 5%, 10% 
and 20%. The impact of the errors on the video quality 
and the resulting impairments were then evaluated for each 
single bitstream. 

5.2 Comparison of MPEG-4 SVC to MPEG-4 
AVC/ASP 

5.2.7 Quality comparison test 

During the quality comparison test, nine test sequences are 
encoded with the three evaluated CODECs Xvid, x264 and 
SVC. The CIF sequences are encoded with 200 kbps, the 
4CIF and HD sequences with 1000 kbps. 



6.1 MPEG-4 SVC stand-alone test 

First, the results from different tests regarding the SVC op- 
tions are compared. It has to be mentioned that some tests 
could only be performed using objective metrics as the dif- 
ferences in quality are too small to be evaluated subjectively. 



6.1.1 Encoding parameter test 

Motion estimation. As the following paragraphs show, mo- 
tion estimation has only little impact on visual quality or 
bitrate. However, these options have a high influence on en- 
coding speed. 

First, the results of the comparison of different block match- 
ing metrics are presented. As the results in table |6] show, 
the effect of different block matching metrics on Y-PSNR 
are rather small but can have a major impact on encoding 
time. 



FullPel 


SubPel 


AEnc. time 


AY-PSNR 


SAD 


SAD 


±0.0000% 


±0.0000% 


SAD 


SSE 


+4.4902% 


-0.2384% 


SAD 


HADAMARD 


+8.0449% 


+0.2517% 


SSE 


SAD 


+4.3966% 


-0.0086% 


SSE 


SSE 


+9.6352% 


-0.2442% 


SSE 


TUT A T\ A T\ /r A T> T\ 

HADAMARD 


+9.822370 


+0.228270 


HADAMARD 


SAD 


+ 143.5921% 


+0.0487% 


HADAMARD 


SSE 


+ 153.0402% 


-0.2012% 


HADAMARD 


HADAMARD 


+ 146.9598% 


+0.2785% 


SAD-YUV 


SAD 


+33.9570% 


+0.0036% 


SAD-YUV 


SSE 


+36.9504% 


-0.2328% 


SAD-YUV 


HADAMARD 


+29.5603% 


+0.2563% 



Table 6: Impact of different block matching metrics. 



Another interesting fact is that Y-PSNR is mostly inde- 
pendent from the FullPel block matching metric. Con- 
cerning the SubPel metric, 'HADAMARD' always reaches 
the highest Y-PSNR values, while 'SSE' reaches the low- 
est, regardless of the used FullPel metric. On the other 
hand, the required processing time depends primarily on 
the chosen FullPel metric, 'SAD' is the best choice here. 
'HADAMARD' is a poor choice for FullPel as it leads to 
a significant increase in encoding time, however Y-PSNR 
does not profit much from it. Concluding, a combination of 
'SAD' as FuhPel and 'HADAMARD' as SubPel metric can 
be recommended. 

The next aspect analyzed is the effect of different block 
matching algorithms. Among the results, the most striking 
point is that block search has little impact on visual quality 
but leads to a very large increase in encoding time, as table 
[71 shows. 



Algorithm 


AEnc. time 


AFilesize 


AY-PSNR 


Block search 
Fast search 


±0.0000% 
-96.0624% 


±0.0000% 
±0.8825% 


±0.0000% 
±0.0066% 



Table 7: Impact of different block matching algo- 
rithms. 



most time for high motion ('Football' sequence) and least 
for low motion content ('Foreman' sequence). 



6.7.2 Quantization parameter test 

Figure [2] shows a comparison of objective and subjective 
quality scores obtained in the quantization parameter test. 
As easily visible, both scores differ significantly: While the 
objective score degrades almost linearly with the rising QP 
value, the subjective score shows very little quality impair- 
ment up to a QP value of 30, but then quickly falls to a 
relative score of about 25% at QP 40. 




10 20 30 40 50 



Figure 2: Normalized average marks of the objective 
and subjective quality in the quantization parameter 
test. 



Even when normalizing the bitrates, the advantage of block 
search in Y-PSNR values increases only very slightly, but 
the gain is still very small compared to the increase in encod- 
ing time of ±2439.65%. In common scenarios, 'Fast search' 
is the much more feasible choice among the two algorithms. 
An interesting fact is that the processing time needed for 
block search is independent from the motion present in the 
source video, this is different for the fast search algorithm. 
The exact values are given in table [8] 



Sequence 


Encoding time per frame [s] 


'Bus' 

'Footbah' 

'Foreman' 


0.9333 
1.1846 
0.7767 



Table 8: Dependence of fast search on source videos. 



This is due to the early break criteria present in most fast 
search algorithms. As soon as the chosen block matching 
metric value falls under a certain threshold for the consid- 
ered block candidate any further evaluation of candidates for 
this block is omitted. This explains why fast search takes 



This test shows that there is a significant gap between PSNR 
and subjectively perceived quality. Apparently a certain 
amount of loss in high frequency information does not im- 
pair perceived quality much, but of course this loss is al- 
ready picked up by the PSNR calculation. The intersection 
of both graphs is at about QP 33, both scores reach about 
61% relative quality there. 



6.1.3 Optimal quantization parameter test 
In both objective and subjective testing, an optimal choice 
for the quantization parameter is derived. In this eval- 
uation, the optimal QP setting is located where the 
best relation of filesize and visual quality is present: 
MAX{normalize{^-^:^) -\- normalize{Visual Quality)). 
First, the whole range of the possible quantization parame- 
ter settings is tested in steps of 10 points. The results are 
shown in figure [3] 




I Normalized objective 
I Normalized subjective 



Quantization parameter 



Figure 3: Approximation of the optimal objective 
and subjective quantization parameter. 



Using this result, a fine granular search for the optimum 
quantization parameter value is conducted. Figure |4] shows 
the significant differences between objective and subjective 
results. 



I Normalized objective 
I Normalized subjective 



31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 
Quantization parameter 

Figure 4: Optimal objective and subjective quanti- 
zation parameter. 



While objective score reaches its maximum at QP 43, the 
subjective maximum is at QP 36. The reason for this dis- 
crepancy is the high subjective quality loss in the region 
between QP 30 and 40, while Y-PSNR values show only 
moderate decreases in this region. 



6.1.4 CGS/MGStest 

The CGS / MGS test show similar results in both objective 
and subjective evaluation. At bitrates between the two SNR 
layers, MGS encoding can lead to a significant increase in 
quality. Figure [5] and [6] show the relative gain of objective 
and subjective values using 4 MGS vectors. 



^ medium 




Figure 5: Objective mean scores for CGS and MGS 
coded sequences. 



^ medium 




ITU-R 
averaged mark 



Figure 6: Subjective mean scores for CGS and MGS 
coded sequences. 



As the objective tests show, the quality level assigner tool 
can be used to achieve an almost linear PSNR increase with 
a low number of MGS vectors, which can be seen in figure 



El 




Extraction Bitrate 



Figure 7: Comparison of the 'Foreman' sequence us- 
ing 2x8 MGS vectors with and without the quality 
level assigner tool. 



6.7.5 Best extraction path test 

While the results of the objective best extraction path as- 
sessment showed the best PSNR values for sequences en- 
coded in 4CIF resolution and 30 / 60 Hz, in subjective test- 
ing, in contrast, especially the bitstream using the highest 
possible spatial and temporal level is rated very poor. This 
finding matches with the ones previously mentioned in the 
quantization parameter test, where the subjective quality 
ratings suddenly drops between QP 30 and 40, whereas the 
objective scores scaled almost linearly throughout the whole 
QP range. Because the QP had to be adjusted higher than 
40 when 4CIF resolution and a framerate of 60 Hz is used in 
the 'Harbour' and 'Crew' sequence, the corresponding qual- 
ity scores are much lower in the subjective test than in the 
objective one. In the following figures, the numbers from 1 
to 12 indicate the visual quality of each selectable bitstream, 
where 1 is the best and 12 the worst rating. 
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Resolution 4CIF 



Figure 8: Objective and subjective quality marks for 
different framerates and resolutions. 
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Figure 9: Subjective quality marks for different 
framerates and resolutions. 



the objective evaluation, while negative marks indicate that 
subjective rating is higher than the objective. 
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Figure 10: Comparison of mean objective and sub- 
jective quality marks for different framerates and 
resolutions. 

6.1.6 Packet loss test: 

The results of the error recovery test were different when 
comparing a single layer encode to one using multiple layers. 
The detailed results are listed below for each encoding 
setting. 

Standard encoding parameters: When the standard en- 
coding parameters were used, the decodeable video duration 
was shortened linearly with the amount of packet errors. 
That applied to both bitstreams with high as well as low 
layer-IDs in this case. The image artifacts manifested in 
an increasing amount of blocking and color distortions the 
higher the packet error rate was adjusted. 



Error rate 


Layer 


# decodable 


ITU-R mark 







71 


2.42 


3% 


1 


143 


3.79 




2 


289 


5.96 







70 


0.75 


5% 


1 


142 


1.53 




2 


280 


5.79 







64 


0.80 


10% 


1 


130 


2.56 




2 


265 


5.42 







57 


0.78 


20% 


1 


116 


1.14 




2 


235 


2.34 



Table 9: Impact of packet errors on the decodeable 
video duration and subjective quality. 



Apart from that, it is additionally visible that QCIF reso- 
lution, as well as all streams encoded with 7.5 Hz framerate 
received very low scores in both test runs. As a result, the 
selection of the lowest spatial and/or temporal resolution 
should be avoided as far as possible. 

The differences of the objective and subjective testings are 
visualized in the following figure, where the difference of 
the results of both assessments is calculated. Scores higher 
than show that a video stream received a higher rating in 



The duration of the unimpaired sequences depends on the 
used framerate, therefore the full number of frames are: 
300 for 30 Hz (layer-ID 2), 150 for 15 Hz (layer-ID 1) and 
75 for 7.5 Hz (layer-ID 0). 

This applies to the following test scenarios too, only the 
number of layer-IDs is doubled because of the additional 
enhancement layer. The full duration is therefore: 300 for 
layer-ID 2 & 5, 150 for layer-ID 1 & 4 and 75 for layer-ID 



k 3. 



Spatial scalability: The video file with spatial scalable 
layers showed a different behavior during the assessment. 
Even low error rates had a serious impact on the bitstream 
with layer-IDs below 3. When the error rate reached 5%, 
layer-ID was not decodeable, by 10%, all three smallest 
bitstreams were not viewable anymore. The layer-IDs bigger 
than 3 were still decently viewable up to a error rate of 5%, 
then the decodeable duration of the sequence was shortened 
to 55.83% in mean. Additionally, severe color distortions 
were already present in files that were processed with 5% 
error rate. 



Error rate 


Layer 


9^ decodable 


11 U-rC marK 







23 


0.91 




1 


58 


2.58 




2 


22 


1.82 


3% 


3 


64 


1.92 




4 


135 


5.37 




5 


280 


6.83 





1 


9 


0.36 




2 


31 


0.92 


5% 


3 


61 


0.99 




4 


130 


1.21 




5 


236 


1.32 


10% 



1 
2 
3 


67 


2.16 




4 


86 


3.48 




5 


163 


3.85 


20% 



1 

2 
3 
4 
5 


50 
107 
110 


0.83 
0.49 
0.59 



Table 10: Impact of packet errors on the decodeable 
video duration and subjective quality. 



SNR scalability: The SNR-scalability test sequences suf- 
fered similar duration shortenings as the spatial scalable 
ones. Also the problem that the three lower bitstreams are 
not decodeable when the error rate reaches 10% reappears 
during the SNR-scalability test. The visual impairments 
however manifested in a mixture of blocking artifacts, refer- 
ence frame errors and complete picture losses lasting for one 
frame. 



Error rate 


Layer 


# decodable 


ITU-R mark 







27 


1.31 




1 


62 


6.25 




2 


26 


1.89 


3% 


3 


64 


2.93 




4 


135 


6.81 




5 


293 


6.25 







2 


0.22 




1 


13 


0.40 




2 


35 


0.48 


5% 


3 


61 


0.55 




4 


114 


0.76 




5 


152 


0.76 


10% 



1 
2 
3 


— 
67 


— 
1.88 




4 


75 


2.85 




5 


117 


2.87 


20% 



1 
2 
3 
4 
5 


49 
123 

48 


0.55 
0.85 
0.61 



Table 11: Impact of packet errors on the decodeable 
video duration and subjective quality. 



The tables [9] ^] and show the encoding settings, error 
rates, layer-IDs, number of decodeable video frames and 
the corresponding ITU averaged mark. 



In conclusion, when comparing the subjective quality marks, 
it is visible that the encode using standard settings in mean 
suffers less of the packet loss than any bitstream with mul- 
tiple scalable layers. The quality developed as expected in 
this test case: With increasing error rate, the visual quality 
decreased steadily. 

If scalable layers were present, especially the quality of the 
bitstreams with small layer-IDs (0 - 2) suffered severely un- 
der the packet loss, making them completely undecodeable 
if 10% or more error rate was selected. 

An interesting development could be observed in the highest 
layer-ID (5): While the visual quality was nearly on par in 
all encoding scenarios when an error rate of 3% was used, 
the scalable encoded sequences did - in contrast to the stan- 
dard settings encode - not show a clear negative trend, which 
would be expected when applying increasing error rates. In- 
stead, the perceived quality of 3% and 10% error rate were 
much higher than the one of 5% and 20% error rate. 
This is surprising, as the decodeable duration of the se- 
quences with 10% error rate is lower than that of the ones 
with 5%. The main reason for the low scores is most likely 
the overall loss of color information in both scalable encodes 
with 5% error rate, which is not present in the encode with 
10% errors. An impression of some typical error patterns in 
the sequences is given in figure pT] 




Figure 11: Different artifacts due to packet loss. 

6.2 Comparison of MPEG-4 SVC to MPEG-4 
AVC/ASP 

6.2.1 Quality comparison test 

When looking at the quality comparison test, basically sim- 
ilar results could be observed in both subjective and objec- 
tive testing. The overall visual quality of the three tested 
CODECs in the evaluated scenarios leads to the following 
ranking: 




29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 
Y-PSNR [dB] 

Figure 12: Objective quality results of the quality 
comparison test. 



SVC is directly based on the MPEG-4 AVC standard. Small 
differences in quality are attributable to the higher technical 
maturity and optimizations of the x264 CODEC in this 
case. 

The only significant difference is that quality variations 
show a higher amplitude in the subjective evaluation than 
in the objective one. This becomes especially visible when 
looking at the results of the Xvid CODEC, where the 
objective still scored 93.5%, while the subjective mark is 
only 65.2%. The same phenomenon is already observeable 
in the CCS / MGS assessment. 



During the quality comparison, a particular flaw in the SVC 
CODEC became apparent: The rate control. Even though 
the requested bitrate is delivered in most cases quite accu- 
rately, the resulting quality can be unstable under certain 
conditions. Figure [M] shows the Y-PSNR values over the 
whole 'Crew' sequence for x264 and SVC. 
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ITU-R 
averaged mark 



Figure 13: Subjective quality results of the quality 
comparison test. 




Frame number 



Figure 14: Comparison of Y-PSNR of the 'Crew' 
sequence {top x264, bottom SVC). 



These results are not surprising, considering Xvid is by 
far the oldest CODEC in the comparison, while the per- 
formance of MPEG-4 SVC - if the standard settings are 
employed - is expected to be similar to that of x264, because 



While the maximum fluctuation amplitude of x264 is about 
5 dB, the SVC CODEC reaches about 10 dB. Even though 
the PSNR is inherently fluctuating due to the hierarchical 
B-frame structure inside a GOP, this usually accounts only 
for about 3 dB fluctuation. There is also another signifi- 



cant flaw in SVC rate control: In certain short sequences, 
the CODEC tends to distribute too much bitrate at the be- 
ginning of the sequence. This is followed by an excessive 
increase of quantization at the end of the file to keep the bi- 
trate inside the given boundaries. Of course, visual quality 
suffers significantly from this non uniform bitrate distribu- 
tion as short subjective assessments showed. 

It is however noteworthy that this particular behavior did 
not occur in every sequence, the 'Foreman' and 'City' se- 
quences are, for example, not affected. Further research 
would be necessary to exactly locate the cause of this prob- 
lem. 



6.2.2 Encoding speed test 

The encoding time is measured on two different test systems 
to evaluate the impact of different CPU speeds and capabil- 
ities on SVC encoding. The details of both test systems are 
listed in tables [121 and [131 



OS 


Microsoft Windows Vista Business 
64-Bit, Version: 6.0.6001 SPl 


CPU 


Intel® Core™ 2 Quad Q9450 
4x2.66 GHz 


RAM 


4096 MB DDR2 800 
Timings: 5-5-5-18 


BIOS 


American Megatrends 
Inc. V1.8, 24.01.2008 


HDD 


Samsung Spinpoint T166, 320 GB, 
7200 RPM, 16 MB Cache 


Video Adapter 


NVIDIA GeForce 8800 GTS 512 



Table 12: Hardware configuration for test system 1. 



OS 


Microsoft Windows Vista Business 
32-Bit, Version: 6.0.6001 SPl 


CPU 


Intel® Core™ 2 Duo T5500 
2x1.66 GHz @ 1.00 GHz 


RAM 


2048 MB DDR2 667 
Timings: 5-5-5-15 


BIOS 


Phoenix Technologies LTD 
23YA, 17.04.2007 


HDD 


Hitachi Travelstar 5K100, 100 GB, 
5400 RPM, 8MB Cache 


Video Adapter 


ATI Radeon Xpress 200M 



Table 13: Hardware configuration for test system 2. 



The following tables show the detailed results for both test 
systems. Both the absolute times and the relative speedup 
with System 2 as reference are given. 







GIF 






Xvid 


x264 


SVC 


System 1 


1.1 


0.9 


387.7 


System 2 


4.1 


7.2 


947.8 






4CIF 






Xvid 


x264 


SVC 


System 1 


12.1 


7.8 


2778.5 


System 2 


42.1 


60.8 


8155.4 






HD 






Xvid 


x264 


SVC 


System 1 


15.3 


7.7 


3902.1 


System 2 


41.6 


60.1 


9575.8 



Table 14: Average encoding time for GIF, 4CIF and 
HD resolutions on different computer systems in sec- 
onds. 



As table ^] shows, there are significant differences in 
speedup between the different CODECs. 
SVC just seems to profit from the higher core clock of 
system 1, as the speed scales linearly with the core clock 
( 2 66G^z ~ 0.376). Xvid speedup is slightly higher, maybe 
due to optimizations for the new SSE instruction sets imple- 
mented in the quadcore processors. The biggest speed gain 
can be observed using the x264 CODEC. This is because 
x264 is the only CODEC that supported multithreaded en- 
coding at the time of testing, so the quadcore processor could 
be used to its full potential. It has to be mentioned that the 
new 1.2.1 version of Xvid also supports multithreaded en- 
coding, so the speedup can be expected be on par with x264. 



7. CURRENT SVC FLAWS 



7.1 Improvement of existing features 

While the new MPEG-4 SVC CODEC adds many useful 
features to its predecessor MPEG-4 AVC, some flaws could 
still be observed during the subjective as well as the objec- 
tive evaluations. These are described in the next section. 



7.7.7 More reasonable default configuration 

Some parameters of the SVC configuration files are by 
default not reasonably adjusted. The most important is the 
value of 'BaseLayerMode', whose default value is '3', which 
is not even a defined setting as the only possible choices 
are '0' (= 'AVC compatible base layer with larger DPB 
size,'), '1' (= 'AVC compatible base layer') or '2' (= 'AVC 
compatible base layer with sub-sequence SEI messages for 
supporting temporal scalability'). As a result, it is proposed 
to change the currently undefined value of '3' to a allowed 
one. The value '2' is used during all the assessments in this 
work, as it is the most advanced of all. 

Although being allowed and defined, the value of '1' for 
the setting 'GOPSize' is also not reasonable, as it heavily 
cripples the amount of temporal scalability possible. To 
understand this, it is necessary to explain the changes in 
coding structure that come along with the size of the GOP: 



GOPSize 


Coding structure 


1 


IPPPPPPPPPPPPPPPPP... 


2 


IB PB PB PB PB PB PB PB PB . . . 


4 


IBBB PBBB PBBB PBBB PB . . . 


8 


IBBBBBBB PBBBBBBB PB . . . 


16 


IBBBBBBBBBBBBBBB PB . . . 



Table 15: Coding structure for different GOP sizes. 



As the table above shows, the default 'GOPSize' value of '1' 
causes that no B-frames at all are used, which leads, on the 
positive side, as the previous tests have shown, to a shorter 
encoding time but also to only one temporal level without 
any scaled substreams. Hence, a change of the default pa- 
rameter to a value of '8' or '16' is purposed. 
Because the encoding speed of SVC is currently low, the 
default parameter '0' (= 'BlockSearch') of 'SearchMode' is 
also not considered to be reasonable, as the quality gain 
of the blocksearch algorithm is - compared to the severely 
higher encoding time needed - only marginal. In order to 
significantly improve the encoding speed the default value 
of 'SearchMode' should be switched to '4' (= 'FastSearch') 



rithm, an average encoding time reduction of 53% can be 
achieved, while the visual quality and bitrate only suffer mi- 
nor impacts. 

The encoding speed as well as the usability would highly 
benefit if the proposed or similar motion estimation algo- 
rithms would be included in the SVC CODEC. 

7.1.3 Enhanced, stable rate control mechanism 

As shown in the synthesis of the CODEC comparison 
evaluation, the SVC rate control feature still has minor 
flaws, which manifest in two ways: 

Firstly, the sequences encoded using rate control have 
a much more unstable PSNR value resulting in quality 
fluctuations. 

Secondly, some sequences show severe quality degradation 
at the last frames, which is supposably also caused due to 
the inability of rate control to correctly adjust the bitrate 
throughout the whole sequence. 



7.1.2 Improve encoding speed 

The previous test have shown that the current MPEG-4 SVC 
version has a much lower encoding speed than the other 
tested CODECs. Firstly, it needs to be mentioned again that 
this is to be expected, as SVC is still in development status, 
but two main reasons can be identified and are explained in 
the following. 




Multithreading. The benefit of multithreading support be- 
comes more and more visible in modern computer sys- 
tems, because multicore configurations are already com- 
monly found in private environments today. If a similar 
encoding speed gain as in x264 when using multithreading 
is proclaimed, the encoding speed would approximately be 
accelerated linearly with the number of available CPUs. 
Although this increase would still not be sufficient to keep 
up with the other CODECs, it would obviously be a step in 
the right direction. 

The main challenge in this process would be a reasonable 
parallelization of the encoding steps to correctly and ef- 
fectively split the work among the available CPUs, which 
would especially concern the motion estimation process, as 
the evaluation has shown. 



Performance improvements of the motion estimation. 

To further decrease the encoding time needs, it would be es- 
sential to optimize the performance of the motion estimation 
algorithms. As already noted in 16 , the currently employed 
motion estimation technique achieves the best quality pos- 
sible. However, the computation complexity is very high, 
which obstructs it from practical use. ^ also proposes 
a fast mode decision algorithm for inter-frame coding as a 
solution, which relies on the mode-distribution correlation 
between the base and enhancement layers. Using this algo- 



Figure 15: Rate control introduced blocking arti- 
facts at the end of a sequence. 



Because the exact reasons for these behaviors could not be 
precisely pinpointed in the tests, no concrete proposal for 
improvement can be given here. Still, improvements in this 
area are regarded as necessary. 

7.2 Additional useful features 

In the next section, additional features, that are not imple- 
mented in the current SVC release, but would be useful, are 
described. 

7.2.7 Variable, content-dependent framerate 

As scalable video technology is especially advantageous in 
streaming media environments, a useful new technique, 
which is already used by other video CODECs, would be 
the usage of a content- aware dynamic framerate. 
An example for the successful implementation of variable 
framerate is the Blackbird CODEC used in the FORscene 
system developed by Forbidden Technologies pic, which is 
optimized for video transmission over heterogeneous net- 
works. Because the CODEC is fully proprietary, no further 
information can be given here. 

The basic idea of variable content-dependent framerate is 
that a reduced temporal level does not impair scenes with 



no or very low movement, which was already proven by [§]. 
There could be two main positive results when reducing the 
framerate: Either the file size of the video sequence could be 
reduced, or - if the size remains constant - the SNR quality 
would benefit respectively. 

7.2.2 2-Pass encoding mode 

2-pass encoding strategies have been implemented in most 
modern CODECs, for example Xvid or x264 which have 
been examined earlier. 2-pass encoding works by first ana- 
lyzing the videos complexity (first pass) , after that the avail- 
able bitrate is distributed dynamically to achieve maximum 
quality (second pass). This is especially useful for archiving 
purposes, as high bitrate 'spikes' are not of concern. In con- 
trast, when using 2-pass mode for streaming applications, 
special care has to be taken not to overload the connec- 
tion. This can be done on client side or while encoding the 
video. On client side, high differences in video bitrate can 
be compensated by using large buffers, of course this also 
has downsides: First, filling these buffers can take a certain 
amount of time, so the user has to wait before the requested 
video starts. Second, the memory required for storing a 
high amount of video frames is not always available, espe- 
cially in highly mobile devices. If the problem of bitrate 
spikes is addressed while encoding, a threshold value has 
to be defined as an absolute maximum bitrate, so that the 
bandwidth of the connection is always capable of delivering 
the video stream. 

Implementing this feature into SVC would primarily benefit 
its suitability for archiving storage. Of course, the poor rate 
control of SVC would also benefit from the bitrate distribu- 
tion algorithms in 2-pass mode. In spite of this fact, it is 
essential that single pass rate control of SVC is improved, 
as 2-pass encoding mode is not suited for realtime encoding. 

7.3 Conclusion 

The extensive tests conducted in this work showed that the 
new scalable video coding extensions provide significant im- 
provement in terms of adaptability of the video stream. This 
is especially important in the modern, heterogeneous net- 
work conditions caused by the growing number of mobile 
multimedia devices. In contrast, there is also a growing de- 
mand for high quality digital video, mostly for the emerging 
high definition television standard. Using the scalability fea- 
tures of SVC, both of these demands can be met simultane- 
ously, while at the same time saving bitrate compared to the 
storage of separate videos tailored for each device. Further, 
using the combination of media aware network components 
and IP multicasting could provide an enormous potential for 
saving upstream bandwidth for video servers. 
However, there are also several features that still need im- 
provement. First and foremost, the encoding speed of the 
SVC reference encoder is far too slow. Two methods to 
speed up the encoding are already proposed before. The 
successful acceleration of the encoding process is by far the 
most pressing matter, as usage at current speed levels is not 
feasible in large scale. Additionally, several optimizations 
and other new useful features are proposed in the previous 
section. 

Concluding, SVC is a promising new extension to the MPEG 
CODEC family. If the most severe issues are addressed, it 
is likely to significantly improve the viewing experience of 
digital video consumers. 
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